Quick check of IBDMDB metagenomics data subset

Author

David Barnett

Published

March 26, 2024

Code
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
here() starts at /Users/david/Documents/github/my_projects/teaching/workshops/2024-nutrim/ibdmdb-data-wrangling
Code
Registered S3 method overwritten by 'seriation':
  method         from 
  reorder.hclust vegan
Code
microViz version 0.12.1 - Copyright (C) 2021-2024 David Barnett
! Website: https://david-barnett.github.io/microViz
✔ Useful?  For citation details, run: `citation("microViz")`
✖ Silence? `suppressPackageStartupMessages(library(microViz))`

Intro

  • Focusing on 1 sample per participant fairly arbitrarily chosen as first in visit order
  • Checking for any overall compositional differences between Controls and UC and CD patients
  • Warnings about lack of counts I’ll fix in microViz as they’re annoying when the metaphlan profiles are proportions

Get combined data

Code
# also fix/delete messy variables that produce variable name warnings
ps <- read_rds(here("data", "phyloseq.rds"))

ps <- ps %>% 
  ps_mutate(
    ever_bowel_surgery = ..6..Have.you.ever.had.bowel.surgery.,
    ..6..Have.you.ever.had.bowel.surgery. = NULL,
    diarrhea_2w = ..4..In.the.past.2.weeks..have.you.had.diarrhea.,
    hospitalised_2w = ..5..In.the.past.2.weeks..have.you.been.hospitalized.,
    diagnosis = factor(diagnosis, levels = c("CD", "UC", "nonIBD"))
    ) %>% 
  ps_select(!contains("In.the.past.2.weeks")) 
  • Chuck bad samples (low reads)
Code
ps %>% samdat_tbl() %>% 
  ggplot(aes(x = reads_raw)) + 
  geom_histogram(bins = 100) + 
  coord_flip() + 
  scale_x_log10()

Code
ps <- ps %>% ps_filter(reads_raw > 10^5)
  • Compute alpha diversity
Code
ps <- ps %>% ps_calc_diversity(
  rank = "Species", index = "shannon", varname = "shannon_species"
)
Code
# take only first sample per person (by visit order)
ps1 <- ps %>%
  ps_arrange(visit_num) %>% 
  ps_dedupe(vars = "Participant.ID", method = "first")
130 groups: with 1 to 26 samples each
Dropped 1496 samples.

Quick plots and analyses

Compositions

  • No obvious differences between groups but quite some variation to explore and discuss.
Code
ps1 %>%
  comp_barplot(
    tax_level = "Genus", facet_by = "diagnosis",
    n_taxa = 13, merge_other = FALSE, 
  ) +
  coord_flip()
Warning: otu_table of counts is NOT available!
otu_table of counts is NOT available!
otu_table of counts is NOT available!
Available otu_table contains 3922 values that are not non-negative integers

Code
cols <- distinct_palette(n = 3, add = NA)
names(cols) <- levels(samdat_tbl(ps1)$diagnosis)

ps1 %>%
  ps_mutate(Diagnosis = as.character(diagnosis)) %>% 
  tax_agg(rank = "Species") %>% 
  tax_sort(by = sum, trans = "compositional") %>% 
  tax_transform("clr", rank = "Species") %>% 
  comp_heatmap(
    taxa = 1:40, taxa_side = "right",
    tax_anno = taxAnnotation(
      Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
    ),
    sample_anno = sampleAnnotation(
      Diagnosis = anno_sample("Diagnosis"),
      col = list(Diagnosis = cols), border = FALSE
  )
)
Warning: otu_table of counts is NOT available!
otu_table of counts is NOT available!
otu_table of counts is NOT available!
otu_table of counts is NOT available!
Available otu_table contains 6787 values that are not non-negative integers

CLR PCA

Code
ps1 %>%
  tax_transform("clr", rank = "Species") %>%
  ord_calc("PCA") %>%
  ord_plot(colour = "diagnosis") +
  stat_ellipse(aes(colour = diagnosis))
Warning: otu_table of counts is NOT available!
Available otu_table contains 6787 values that are not non-negative integers

Code
# interactive
# ps1 %>%
#   tax_transform("clr", rank = "Species") %>%
#   ord_calc("PCA") %>%
#   ord_explore()
Code
ps1 %>%
  tax_transform("clr", rank = "Species") %>%
  ord_calc("PCA") %>%
  ord_plot(colour = "diagnosis", axes = c(1,3)) +
  stat_ellipse(aes(colour = diagnosis))
Warning: otu_table of counts is NOT available!
Available otu_table contains 6787 values that are not non-negative integers

Code
ps1 %>%
  tax_transform("clr", rank = "Species") %>%
  ord_calc("PCA") %>%
  ord_plot(colour = "diagnosis", axes = c(1,3), plot_taxa = 1:10)
Warning: otu_table of counts is NOT available!
Available otu_table contains 6787 values that are not non-negative integers

PERMANOVAs

  • There is a difference by Aitchison distance on Species level
Code
ps1 %>%
  tax_transform("clr", rank = "Species") %>%
  dist_calc("euclidean") %>%
  dist_permanova("diagnosis") %>% 
  perm_get()
Warning: otu_table of counts is NOT available!
Available otu_table contains 6787 values that are not non-negative integers
2024-03-26 14:24:22.061981 - Starting PERMANOVA with 999 perms with 1 processes
2024-03-26 14:24:22.208815 - Finished PERMANOVA
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
           Df SumOfSqs      R2      F Pr(>F)  
diagnosis   2     4553 0.02062 1.3368  0.031 *
Residual  127   216257 0.97938                
Total     129   220810 1.00000                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
ps1 %>%
  tax_transform("identity", rank = "Genus") %>%
  dist_calc("bray") %>%
  dist_permanova("diagnosis") %>% 
  perm_get()
Warning: otu_table of counts is NOT available!
Available otu_table contains 3922 values that are not non-negative integers
2024-03-26 14:24:22.271283 - Starting PERMANOVA with 999 perms with 1 processes
2024-03-26 14:24:22.412579 - Finished PERMANOVA
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
           Df SumOfSqs      R2      F Pr(>F)
diagnosis   2   0.5273 0.02289 1.4877  0.138
Residual  127  22.5080 0.97711              
Total     129  23.0354 1.00000              
Code
ps1 %>%
  tax_transform("binary", rank = "Species") %>%
  dist_calc("jaccard") %>%
  dist_permanova("diagnosis") %>% 
  perm_get()
Warning: otu_table of counts is NOT available!
Available otu_table contains 6787 values that are not non-negative integers
2024-03-26 14:24:22.505178 - Starting PERMANOVA with 999 perms with 1 processes
2024-03-26 14:24:22.882352 - Finished PERMANOVA
Permutation test for adonis under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
           Df SumOfSqs      R2      F Pr(>F)  
diagnosis   2    0.713 0.01989 1.2887  0.031 *
Residual  127   35.109 0.98011                
Total     129   35.821 1.00000                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Alpha diversity

Compare groups - timepoint 1

  • By eye the diversity of some but not all patient samples is lower than controls.
Code
ps1 %>%
  samdat_tbl() %>%
  ggplot(aes(x = diagnosis, y = shannon_species)) +
  geom_boxplot() +
  geom_jitter()

  • Colitis vs controls
Code
ps1 %>%
  samdat_tbl() %>%
  filter(diagnosis != "CD") %>%
  ggstatsplot::ggbetweenstats(
    x = diagnosis, y = shannon_species
  )
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

  • Crohn’s vs controls
Code
ps1 %>%
  samdat_tbl() %>%
  filter(diagnosis != "UC") %>%
  ggstatsplot::ggbetweenstats(
    x = diagnosis, y = shannon_species
  )
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

Check timepoint 1 relation to antibiotics etc?

Code
ps1 %>%
  samdat_tbl() %>%
  ggstatsplot::ggbetweenstats(x = Antibiotics, y = shannon_species)
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

Code
ps1 %>%
  samdat_tbl() %>%
  ggstatsplot::ggbetweenstats(x = diarrhea_2w, y = shannon_species)
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

Code
ps1 %>%
  samdat_tbl() %>%
  ggstatsplot::ggbetweenstats(x = hospitalised_2w, y = shannon_species)
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

Code
ps1 %>%
  samdat_tbl() %>%
  ggstatsplot::ggbetweenstats(x = ever_bowel_surgery, y = shannon_species)
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

Code
ps %>%
  samdat_tbl() %>%
  ggstatsplot::ggscatterstats(
    x = consent_age, y = shannon_species, 
    point.args = list(mapping = aes(colour = diagnosis))
  )
Registered S3 method overwritten by 'ggside':
  method from   
  +.gg   ggplot2
`stat_xsidebin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_ysidebin()` using `bins = 30`. Pick better value with `binwidth`.

Diversity over time

  • Just for my own interest now, because I haven’t done an IBD project before
  • Seems like a subset of IBD patients have persistently very low diversity
  • Diversity is a crude measure, compositional stability would be more interesting, but I should stop now..
Code
ps %>% 
  samdat_tbl() %>% 
  add_count(name = "n_samples", .by = Participant.ID) %>% 
  filter(n_samples > 3) %>% 
  mutate(mean_shannon = mean(shannon_species), .by = Participant.ID) %>% 
  ggplot(aes(x = visit_num, y = shannon_species)) +
  geom_hline(yintercept = 1.5, linetype = "dashed") +
  geom_line(aes(group = Participant.ID, colour = mean_shannon < 1.5), alpha = 0.5) +
  geom_smooth(method = "lm", colour = "black") +
  facet_wrap(facets = vars(diagnosis)) +
  guides(colour = "none") +
  theme_bw()
`geom_smooth()` using formula = 'y ~ x'

Code
ps %>% 
  samdat_tbl() %>% 
  group_by(diagnosis, Participant.ID) %>% 
  summarise(
    n = n(), mean_shannon = mean(shannon_species),
    sd_shannon = sd(shannon_species)
  ) %>% 
  ungroup() %>% 
  filter(n > 5) %>% 
  ggplot(aes(colour = diagnosis, x = mean_shannon, y = sd_shannon)) +
  geom_vline(xintercept = 1.5) +
  geom_point(aes(shape = mean_shannon < 1.5)) +
  theme_bw()
`summarise()` has grouped output by 'diagnosis'. You can override using the
`.groups` argument.

Code
ps %>% 
  samdat_tbl() %>% 
  add_count(name = "n_samples", .by = Participant.ID) %>% 
  filter(n_samples > 3) %>% 
  mutate(mean_shannon = mean(shannon_species), .by = Participant.ID) %>% 
  ggplot(aes(
    x = reorder(Participant.ID, shannon_species, FUN = mean), 
    y = shannon_species
  )) +
  geom_hline(yintercept = 1.5) +
  geom_boxplot(aes(fill = mean_shannon < 1.5), outliers = FALSE) +
  geom_point(size = 1) +
  facet_grid(cols = vars(diagnosis), scales = "free_x", space = "fixed") +
  theme_bw() +
  theme(axis.text.x = element_blank())

Session info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.4
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Amsterdam
 date     2024-03-26
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package          * version    date (UTC) lib source
 ade4               1.7-22     2023-02-06 [1] CRAN (R 4.3.0)
 ape                5.7-1      2023-03-13 [1] CRAN (R 4.3.0)
 BayesFactor        0.9.12-4.7 2024-01-24 [1] CRAN (R 4.3.1)
 bayestestR         0.13.2     2024-02-12 [1] CRAN (R 4.3.1)
 Biobase            2.62.0     2023-10-26 [1] Bioconductor
 BiocGenerics       0.48.1     2023-11-02 [1] Bioconductor
 biomformat         1.30.0     2023-10-26 [1] Bioconductor
 Biostrings         2.70.2     2024-01-30 [1] Bioconductor 3.18 (R 4.3.2)
 bitops             1.0-7      2021-04-24 [1] CRAN (R 4.3.0)
 ca                 0.71.1     2020-01-24 [1] CRAN (R 4.3.0)
 Cairo              1.6-2      2023-11-28 [1] CRAN (R 4.3.1)
 circlize           0.4.16     2024-02-20 [1] CRAN (R 4.3.1)
 cli                3.6.2      2023-12-11 [1] CRAN (R 4.3.1)
 clue               0.3-65     2023-09-23 [1] CRAN (R 4.3.1)
 cluster            2.1.6      2023-12-01 [1] CRAN (R 4.3.1)
 coda               0.19-4.1   2024-01-31 [1] CRAN (R 4.3.1)
 codetools          0.2-19     2023-02-01 [1] CRAN (R 4.3.2)
 colorspace         2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 ComplexHeatmap     2.18.0     2023-10-26 [1] Bioconductor
 correlation        0.8.4      2023-04-06 [1] CRAN (R 4.3.0)
 crayon             1.5.2      2022-09-29 [1] CRAN (R 4.3.0)
 data.table         1.15.2     2024-02-29 [1] CRAN (R 4.3.1)
 datawizard         0.9.1      2023-12-21 [1] CRAN (R 4.3.1)
 digest             0.6.34     2024-01-11 [1] CRAN (R 4.3.1)
 doParallel         1.0.17     2022-02-07 [1] CRAN (R 4.3.0)
 dplyr            * 1.1.4      2023-11-17 [1] CRAN (R 4.3.1)
 effectsize         0.8.6      2023-09-14 [1] CRAN (R 4.3.0)
 evaluate           0.23       2023-11-01 [1] CRAN (R 4.3.1)
 fansi              1.0.6      2023-12-08 [1] CRAN (R 4.3.1)
 farver             2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
 fastmap            1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 forcats          * 1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 foreach            1.5.2      2022-02-02 [1] CRAN (R 4.3.0)
 generics           0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 GenomeInfoDb       1.38.7     2024-03-05 [1] Bioconductor 3.18 (R 4.3.2)
 GenomeInfoDbData   1.2.11     2024-03-11 [1] Bioconductor
 GetoptLong         1.0.5      2020-12-15 [1] CRAN (R 4.3.0)
 ggplot2          * 3.5.0      2024-02-23 [1] CRAN (R 4.3.1)
 ggrepel            0.9.5      2024-01-10 [1] CRAN (R 4.3.1)
 ggside             0.3.1      2024-03-01 [1] CRAN (R 4.3.1)
 ggstatsplot        0.12.2     2024-01-14 [1] CRAN (R 4.3.1)
 GlobalOptions      0.1.2      2020-06-10 [1] CRAN (R 4.3.0)
 glue               1.7.0      2024-01-09 [1] CRAN (R 4.3.1)
 gtable             0.3.4      2023-08-21 [1] CRAN (R 4.3.0)
 here             * 1.0.1      2020-12-13 [1] CRAN (R 4.3.0)
 hms                1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools          0.5.7      2023-11-03 [1] CRAN (R 4.3.1)
 htmlwidgets        1.6.4      2023-12-06 [1] CRAN (R 4.3.1)
 igraph             2.0.2      2024-02-17 [1] CRAN (R 4.3.1)
 insight            0.19.8     2024-01-31 [1] CRAN (R 4.3.1)
 IRanges            2.36.0     2023-10-26 [1] Bioconductor
 iterators          1.0.14     2022-02-05 [1] CRAN (R 4.3.0)
 jsonlite           1.8.8      2023-12-04 [1] CRAN (R 4.3.1)
 knitr              1.45       2023-10-30 [1] CRAN (R 4.3.1)
 labeling           0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
 lattice            0.22-5     2023-10-24 [1] CRAN (R 4.3.1)
 lifecycle          1.0.4      2023-11-07 [1] CRAN (R 4.3.1)
 lubridate        * 1.9.3      2023-09-27 [1] CRAN (R 4.3.1)
 magrittr           2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 MASS               7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.1)
 Matrix             1.6-5      2024-01-11 [1] CRAN (R 4.3.1)
 MatrixModels       0.5-3      2023-11-06 [1] CRAN (R 4.3.1)
 matrixStats        1.2.0      2023-12-11 [1] CRAN (R 4.3.1)
 mgcv               1.9-1      2023-12-21 [1] CRAN (R 4.3.1)
 microbiome         1.24.0     2023-10-24 [1] Bioconductor
 microViz         * 0.12.1     2024-03-04 [1] Github (david-barnett/microViz@09abc73)
 multtest           2.58.0     2023-10-26 [1] Bioconductor
 munsell            0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 mvtnorm            1.2-4      2023-11-27 [1] CRAN (R 4.3.1)
 nlme               3.1-164    2023-11-27 [1] CRAN (R 4.3.1)
 paletteer          1.6.0      2024-01-21 [1] CRAN (R 4.3.1)
 parameters         0.21.5     2024-02-07 [1] CRAN (R 4.3.1)
 patchwork          1.2.0      2024-01-08 [1] CRAN (R 4.3.1)
 pbapply            1.7-2      2023-06-27 [1] CRAN (R 4.3.0)
 permute            0.9-7      2022-01-27 [1] CRAN (R 4.3.0)
 phyloseq         * 1.46.0     2023-10-24 [1] Bioconductor
 pillar             1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig          2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 plyr               1.8.9      2023-10-02 [1] CRAN (R 4.3.1)
 png                0.1-8      2022-11-29 [1] CRAN (R 4.3.0)
 prismatic          1.1.1      2022-08-15 [1] CRAN (R 4.3.0)
 purrr            * 1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
 R6                 2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 RColorBrewer       1.1-3      2022-04-03 [1] CRAN (R 4.3.0)
 Rcpp               1.0.12     2024-01-09 [1] CRAN (R 4.3.1)
 RCurl              1.98-1.14  2024-01-09 [1] CRAN (R 4.3.1)
 readr            * 2.1.5      2024-01-10 [1] CRAN (R 4.3.1)
 registry           0.5-1      2019-03-05 [1] CRAN (R 4.3.0)
 rematch2           2.1.2      2020-05-01 [1] CRAN (R 4.3.0)
 reshape2           1.4.4      2020-04-09 [1] CRAN (R 4.3.0)
 rhdf5              2.46.1     2023-12-02 [1] Bioconductor 3.18 (R 4.3.2)
 rhdf5filters       1.14.1     2023-12-16 [1] Bioconductor 3.18 (R 4.3.2)
 Rhdf5lib           1.24.2     2024-02-10 [1] Bioconductor 3.18 (R 4.3.2)
 rjson              0.2.21     2022-01-09 [1] CRAN (R 4.3.0)
 rlang              1.1.3      2024-01-10 [1] CRAN (R 4.3.1)
 rmarkdown          2.26       2024-03-05 [1] CRAN (R 4.3.1)
 rprojroot          2.0.4      2023-11-05 [1] CRAN (R 4.3.1)
 rstudioapi         0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
 Rtsne              0.17       2023-12-07 [1] CRAN (R 4.3.1)
 S4Vectors          0.40.2     2023-11-25 [1] Bioconductor 3.18 (R 4.3.2)
 scales             1.3.0      2023-11-28 [1] CRAN (R 4.3.1)
 seriation        * 1.5.4      2023-12-12 [1] CRAN (R 4.3.1)
 sessioninfo        1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 shape              1.4.6.1    2024-02-23 [1] CRAN (R 4.3.1)
 statsExpressions   1.5.3      2024-01-13 [1] CRAN (R 4.3.1)
 stringi            1.8.3      2023-12-11 [1] CRAN (R 4.3.1)
 stringr          * 1.5.1      2023-11-14 [1] CRAN (R 4.3.1)
 survival           3.5-8      2024-02-14 [1] RSPM (R 4.3.0)
 tibble           * 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr            * 1.3.1      2024-01-24 [1] CRAN (R 4.3.1)
 tidyselect         1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse        * 2.0.0      2023-02-22 [1] CRAN (R 4.3.0)
 timechange         0.3.0      2024-01-18 [1] CRAN (R 4.3.1)
 TSP                1.2-4      2023-04-04 [1] CRAN (R 4.3.0)
 tzdb               0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8               1.2.4      2023-10-22 [1] CRAN (R 4.3.1)
 vctrs              0.6.5      2023-12-01 [1] CRAN (R 4.3.1)
 vegan              2.6-4      2022-10-11 [1] CRAN (R 4.3.0)
 withr              3.0.0      2024-01-16 [1] CRAN (R 4.3.1)
 xfun               0.42       2024-02-08 [1] CRAN (R 4.3.1)
 XVector            0.42.0     2023-10-26 [1] Bioconductor
 yaml               2.3.8      2023-12-11 [1] CRAN (R 4.3.1)
 zeallot            0.1.0      2018-01-28 [1] CRAN (R 4.3.0)
 zlibbioc           1.48.0     2023-10-26 [1] Bioconductor

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────